Skip to content

Python: Add support for PEP-798#21695

Draft
tausbn wants to merge 8 commits intomainfrom
tausbn/python-add-support-for-pep-798
Draft

Python: Add support for PEP-798#21695
tausbn wants to merge 8 commits intomainfrom
tausbn/python-add-support-for-pep-798

Conversation

@tausbn
Copy link
Copy Markdown
Contributor

@tausbn tausbn commented Apr 10, 2026

The PEP in question adds support for putting "list splats" (*x) inside list/set comprehensions and generator expressions (so, [*x for x in xs] and likewise with curly braces and parentheses), and similarly a "dictionary splat" **d inside dictionary comprehensions {**d for d in ds}.

The actual tree-sitter-python support for this is fairly straightforward -- we just have to allow list_splat and dictionary_splat nodes in the appropriate places.

The difficulty comes when we have to transform the tree-sitter AST into a Python AST. Comprehensions are one of a few instances in the Python extractor where we actually desugar the syntax into a simpler form. Thus, for something like [x + 1 for x in xs], the code that is actually emitted is roughly similar to

def _listcomp(it):
    for x in it:
        yield x + 1
list(_listcomp(xs))

In reality, we never materialise the function separately -- rather it lives as a child of a ListComp / SetComp / DictComp / GenExpr node (which would replace the list call above). Also the parameter it is called .0 -- an otherwise impossible parameter name -- to make it stand out in the AST.

Now, the semantics of [*x for x in xs] is that we treat x as an iterable in its own right, and yield all of its elements in turn. This is exactly what yield from does in Python, so what we could do is simply use yield from in the desugaring in this case.

Thus, something like [*reversed(x) for x in xs] would become

def _listcomp(it):
    for x in it:
        yield from reversed(x)
list(_listcomp(xs))

A similar scheme applies to set comprehensions and generator expressions.

For dictionary comprehensions, the setup is slightly different. Here, the current desugaring of {k: v for k, v in ts} (say) is

def _dictcomp(it):
    for k, v in it:
        yield (k, v)
dict(_dictcomp(ts))

That is, we pass out the elements of the dictionary comprehension as key-value tuples.

For {**d for d in ds}, then, we can do a similar yield from desugaring to the one above, but we need a bit more to get the yielded values to be tuples:

def _dictcomp(it):
    for d in it:
        yield from d.items()
dict(_dictcomp(ts))

The plus side of all of this fiddly business is that we don't have to define any specific control-flow/data-flow for these new constructs. We get all of that for free.

Finally, we fix a bug that was present in the old parser (and dutifully recreated in the new): for some reason we were yielding (value, key) tuples rather than (key, value).

tausbn added 8 commits April 14, 2026 13:27
This is the easy part -- we just allow `dictionary_splat` or
`list_splat` to appear in the same place as the expression.
First, we extend the various location overriding hacks to also accept
list and dict splats in various places. Having done this, we then have
to tackle how to actually desugar these new comprehension forms (as this
is what we currently do for the old forms).

As a reminder, a list comprehension like `[x for x in y]` currently gets
desugared into a small local function, something like

```python
def listcomp(a):
    for x in a:
        yield x
listcomp(y)
```

For `[*x for x in y]`, the behaviour we want is that we unpack `x`
before yielding its elements in turn. This is essentially what we would
get if we were to use `yield from x` instead of `yield x` in the above
desugaring, so that's what we do. This also works for set
comprehensions.

For dict comprehensions, it's slightly more complicated. Here, the
generator function instead yields a stream of `(key, value)` tuples.
(And apparently the old parser got this wrong and emitted `(value, key)`
pairs instead, which we faithfully recreated in the new parser as well.
We fix that bug in both parsers while we're at it). So, a bare `yield
from` is not enough, we also need a `.items()` call to get the
double-starred expression to emit its items as a stream of tuples (that
we then `yield from`.

To make this (hopefully) less verbose in the implementation, we defer
the decision of whether to use `yield` or `yield from` by introducing a
`yield_kind` scoped variable that determines the type of the actual AST
node. And of course for dict comprehensions with unpacking we need to
synthesise the extra machinery mentioned above.

On the plus side, this means we don't have to mess with control-flow, as
the existing machinery should be able to handle the desugared syntax
just fine.
This change reflects the `(value, key)` to `(key, value)` fix in an
earlier commit.
@tausbn tausbn force-pushed the tausbn/python-add-support-for-pep-798 branch from 7922e06 to 8b1ecf0 Compare April 14, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant